Skip to content

Add per-attempt LLM spans under call-level retry (0050)#170

Merged
chris-colinsky merged 10 commits into
mainfrom
feature/0050-per-attempt-llm-spans
Jun 19, 2026
Merged

Add per-attempt LLM spans under call-level retry (0050)#170
chris-colinsky merged 10 commits into
mainfrom
feature/0050-per-attempt-llm-spans

Conversation

@chris-colinsky

Copy link
Copy Markdown
Member

Summary

Completes proposal 0050 by implementing the call-level-retry per-attempt LLM span surface (observability §5.5 / llm-provider §7.1). 0050 shipped partial in v0.14.0 (failure-isolation middleware and the complete(retry=...) loop); this branch lands the deferred piece: under call-level retry, the OTel observer now emits one openarmature.llm.complete span per attempt rather than one per call.

What changed

  • New per-attempt event. A python-internal LlmRetryAttemptEvent (frozen, exported from openarmature.graph) is dispatched once per in-call attempt, carrying that attempt's identity / scoping, request-side fields, and outcome (error_category is None discriminates success from failure).
  • Provider emit. OpenAIProvider.complete() dispatches one LlmRetryAttemptEvent per attempt, including the single attempt of a no-retry call (at index 0), with per-attempt latency that excludes backoff. The terminal LlmCompletionEvent / LlmFailedEvent are unchanged: still exactly one per call.
  • Observer render. The OTel observer renders the openarmature.llm.complete span(s) solely from LlmRetryAttemptEvent, each tagged openarmature.llm.attempt_index (0..N-1). A failed intermediate attempt carries ERROR plus the §4 category plus the request-side attributes; the final (or single) attempt carries the full §5.5 response surface. The two terminal events no longer drive the OTel span; they stay on the queue for the Langfuse mapping and payload / latency consumers. This collapses the previous completion and failed handlers into one.
  • Langfuse renders one terminal Generation per call (the Langfuse observer ignores LlmRetryAttemptEvent).
  • Manifest, docs, changelog. conformance.toml flips 0050 to implemented (since 0.15.0); the observability concepts page documents the attribute, the per-attempt span behavior, and the enricher / consuming-event implications; a 0.15.0 changelog section lands, also backfilling the 0061 detached-trace span entry.

Design notes

LlmRetryAttemptEvent is python-internal, not a spec-normative event type. The per-attempt span contract is the already-accepted observability §5.5 (one span per attempt, openarmature.llm.attempt_index 0..N-1); §5.5 does not pin which internal event the observer renders from, so making this event the sole span source is an implementation choice. The guardrail: each per-attempt span carries the full §5.5 attribute surface, verified by fixtures 057 and 016-021 / 040-042 staying green. For Langfuse, terminal-Generation-per-call is the intended shape; §8 is currently silent on call-level retry, and a spec-side clarification to pin it is tracked (non-blocking).

Testing

  • Spec conformance fixtures 056-058 (transient-then-success, exhaustion, non-transient) driven through the provider plus OTel observer; obs-057 (single-attempt) wired in the conformance harness.
  • New regression test asserting terminal events produce zero OTel spans.
  • Full suite green (1326 passed); pyright and mkdocs clean.

Notes

  • The 0.15.0 changelog date is tentative pending the release tag.
  • A separate follow-up will normalize conformance.toml's proposal note style; not in this PR.

Flip conformance.toml [proposals."0050"] partial -> implemented
(since 0.15.0): the call-level-retry per-attempt span surface now
ships.

Document the openarmature.llm.attempt_index attribute and the
per-attempt span behavior in the observability concepts page, plus
notes that span enrichers receive LlmRetryAttemptEvent on the LLM
span and that the bundled provider dispatches that internal event
alongside the unchanged terminal events.

Add the 0.15.0 changelog section covering this work and backfilling
the 0061 detached-trace invocation span (which landed without an
entry), plus the v0.60.0 -> v0.61.0 spec-pin bullet.
_build_llm_retry_attempt_event constructed a full LlmRetryAttemptEvent
twice, repeating ~18 shared identity, scoping, and request-side fields
across the success and failure branches. Hoist them into one base dict
and splat it, leaving each branch to add only its outcome fields. No
behavior change.
The OTel observer now renders the LLM span solely from the per-attempt
LlmRetryAttemptEvent; terminal LlmCompletionEvent / LlmFailedEvent are
ignored. Add a regression test feeding both terminal events and
asserting zero openarmature.llm.complete spans, guarding against
reintroducing the terminal-event span path.

Also fix a stale docstring in _drive_llm_span_with_cached_tokens that
still referenced "typed LlmCompletionEvent".
Copilot AI review requested due to automatic review settings June 19, 2026 15:44

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements proposal 0050’s observability §5.5 “per-attempt LLM spans” by introducing a new per-attempt internal event type and switching the OTel observer to render openarmature.llm.complete spans exclusively from that per-attempt event (one span per call-level retry attempt), while keeping the terminal LlmCompletionEvent / LlmFailedEvent as one-per-call for non-OTel consumers.

Changes:

  • Add LlmRetryAttemptEvent and dispatch it once per in-call attempt from OpenAIProvider.complete() (attempt latency excludes backoff).
  • Update OTel observer to create per-attempt openarmature.llm.complete spans from LlmRetryAttemptEvent and ignore terminal LLM events for span rendering; Langfuse ignores the new per-attempt event.
  • Update tests, conformance harness behavior, docs, conformance manifest, and changelog to reflect the per-attempt span contract and proposal 0050 being fully implemented.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/unit/test_observability_otel.py Updates unit tests to drive OTel spans from per-attempt events; adds regression coverage for ignoring terminal events; adds fixture-driven per-attempt span assertions.
tests/unit/test_llm_provider.py Updates provider emission-shape test to expect per-attempt event followed by terminal event on success.
tests/conformance/test_observability.py Wires new observability fixture; excludes per-attempt internal event from conformance collector stream.
tests/conformance/test_llm_provider.py Notes that call-level retry fixtures’ per-attempt spans are asserted in OTel unit tests rather than the provider conformance harness.
tests/_helpers/typed_event.py Adds helper for constructing LlmRetryAttemptEvent for tests.
src/openarmature/observability/otel/observer.py Renders openarmature.llm.complete spans from LlmRetryAttemptEvent (one per attempt), and ignores terminal LLM events for span creation.
src/openarmature/observability/langfuse/observer.py Explicitly ignores LlmRetryAttemptEvent to keep one Generation per call from terminal events.
src/openarmature/observability/correlation.py Extends dispatch/observer event unions to include LlmRetryAttemptEvent.
src/openarmature/llm/providers/openai.py Emits per-attempt events within the call-level retry loop via a callback, keeping terminal event behavior unchanged.
src/openarmature/graph/observer.py Extends ObserverEvent union and docs to include the per-attempt internal event.
src/openarmature/graph/events.py Adds the LlmRetryAttemptEvent dataclass and exports it from openarmature.graph.events.
docs/concepts/observability.md Documents per-attempt spans, openarmature.llm.attempt_index, and enricher/consumer implications.
conformance.toml Marks proposal 0050 as implemented since 0.15.0 with updated narrative.
CHANGELOG.md Adds 0.15.0 entries describing per-attempt spans and detached-trace invocation span; records spec pin advance.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/openarmature/graph/events.py
Comment thread tests/conformance/test_observability.py
PR #170 CoPilot review:

- Re-export LlmRetryAttemptEvent from the openarmature.graph package
  (import block + __all__), matching the sibling LlmCompletionEvent /
  LlmFailedEvent so the documented observer import path works.
- Replace the brittle type(event).__name__ name match with an
  isinstance check in the conformance _TypedEventCollector; the
  filter_event_type string comparison stays as-is.
@chris-colinsky chris-colinsky merged commit 7224e30 into main Jun 19, 2026
6 checks passed
@chris-colinsky chris-colinsky deleted the feature/0050-per-attempt-llm-spans branch June 19, 2026 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants